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Abstract — We describe a method for planning under uncer- 
tainty for robotic manipulation of objects by partitioning the 
configuration space into a set of regions that are closed under 
compliant motions. These regions can be treated as states in a 
partially observable Markov decision process (POMDP), which 
can be solved to yield optimal control policies under uncertainty. 
We demonstrate the approach on simple grasping problems, 
showing that it can construct highly robust, efficiently executable 
solutions. 

In this paper we describe some initial experimentation with a 
Barrett arm and a Barret hand, instrumented with some contact 
sensors of our own design. 

This paper is based on [7]. 

I. Introduction 

A great deal of progress has been made on the problem of 
planning motions for robots with many degrees of freedom 
through free space [8, 11]. These methods enable robots to 
move through complex environments, as long as they are not 
in contact with the objects in the world. However, as soon as 
the robot needs to contact the world, in order to manipulate 
objects, for example, open-loop strategies are not sufficiently 
robust. The fundamental problem with planning for motion in 
contact is that the configuration of the robot and the objects in 
the world is not exactly known at the outset of execution, and, 
given the resolution of sensors, it cannot be exactly known. 
In such cases, traditional open-loop plans (even extended with 
simple feedback) are not reliable. 

An early approach to planning in the presence of uncertainty 
was developed in [12]. They used a worst-case model of 
sensor and motion error, and developed a framework for 
computing conservative plans under these assumptions. This 
method was computationally complex, and prone to failure 
due to overconservatism: if there was no plan that would 
work for all possible configurations consistent with the initial 
knowledge state, then the entire system would fail. 

In this paper, we build on those ideas, addressing the 
weaknesses in the approach via abstraction and probabilistic 
representation. By modeling the initial uncertainty using a 
probability distribution, rather than a set, and doing the same 
for uncertainties in dynamics and sensing, we are in a position 
to make trade-offs when it is not possible to succeed in every 
possible situation. We can choose plans that optimize a variety 
of different objective functions involving those probabilities, 
including, most simply, the plan most likely to achieve the 
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goal. The probabilistic representation also affords an opportu- 
nity for enormous computational savings through a focus on 
the parts of the space that are most likely to be encountered. 
By building an abstraction of the underlying continuous 
configuration and action spaces, we lose the possibility of 
acting optimally, but gain an enormous amount in computa- 
tional simplification, making it feasible to compute solutions 
to real problems. Concretely, we will use methods of model 
minimization to create an abstract model of the underlying 
configuration space, and then model the problem of choosing 
actions under uncertainty as a partially observable Markov 
decision process [19]. 

II. Background and Approach 

The approach we outline here applies to any domain in 
which a robot is moving or interacting with other objects 
and there is non-trivial uncertainty in the configuration. In 
this paper, we concentrate on the illustrative problem of a 
robot arm and hand performing pick-and-place operations. 
We assume that the robot's position in the global frame is 
reasonably well known, but that there is some uncertainty 
about the relative pose and/or shape of the object to be 
manipulated. Additionally, we assume that there are tactile 
and/or force sensors on the robot that will enable it to perform 
compliant motions and to reasonably reliably detect when 
it makes or loses contacts. In the particular experiments we 
report, there is no shape uncertainty, and the objects cannot 
slide or tip, but we plan, in future, to apply the approach 
to those problems, as well. We frame this problem primarily 
as a planning problem. That is, we assume that a reasonably 
accurate model of the task dynamics and sensors is known, 
and that the principal uncertainty is in the configuration of the 
robot and the state of the objects in the world. In future work, 
we will address the problem of learning underlying models 
from experience. 

There is a great deal of relevant previous work on this 
problem in the robotics literature [8, 11, 12, 3, 4, 9, 10, 1, 
6, 18, 13, 14]; we survey this literature in more detail in our 
previous paper [7]. There is also a long history of applying 
POMDPs to mobile-robot navigation [2, 17, 21, 15, 16, 20]. 

A. POMDPs 

Partially observed Markov decision processes 
(POMDPs) [19] are the primary model for formalizing 
decision problems under uncertainty. A POMDP model 
consists of finite sets of states S, actions A, and observations 
(9; a reward function R{s^a) that maps each underlying 
state-action pair into an immediate reward; a state-transition 
model P{s^\s^a) that specifies a probability distribution over 
the resulting state s\ given an initial state s and action a; and 



an observation model P{o\s) that specifies the probability of 
making an observation o in a state s. 

Given the model of a POMDP, the problem of optimal 
control can be broken into two parts: state estimation, in 
which a probability distribution over the underlying state of 
the world, or belief state, is recursively estimated based on the 
actions and observations of the agent; and policy execution, in 
which the current belief state is mapped to the optimal control 
action. 

Belief- state update is a straightforward instance 
of a Bayesian filter. The robot's current state 
estimate is an n-dimensional vector, ht, representing 
Pr(5t|oi . . .Ot^ai . . . at-i), a probability distribution over 
current states given the history of actions and observations 
up until time t. 

The problem of deriving an optimal policy is much more 
difficult. The policy for a POMDP with n states is a mapping 
from the n-dimensional simplex (the space of all possible 
belief states) into the action set. Although a policy specifies 
only the next action to be taken, the actions are selected in 
virtue of their long-term effects on the agent's total reward. 
Generally, we seek policies that choose actions to optimize 
either the expected total reward over the next k steps (finite- 
horizon) or the expected infinite discounted sum of reward, in 
which each successive reward after the first is devalued by a 
discount factor of 7. 

These policies are quite complex because, unlike in a 
completely observable MDP, in which an action has to be 
specified for each state, in a POMDP, an action has to be 
specified for every probability distribution over states in the 
space. Thus, the policy will know what to do when the robot 
is completely uncertain about its state, or when it has two 
competing possibilities, or when it knows exactly what is 
happening. 

Computing the exact optimal finite or infinite-horizon so- 
lution of a POMDP is generally extremely computationally 
intractable. However, it is often possible to derive good ap- 
proximate solutions by taking advantage of the fact that the set 
of states that are reachable under a reasonable control policy 
is typically dramatically smaller than the original space [15, 
21, 20]. 

B. State and action abstraction 

Robot manipulation problems are typically framed as having 
high-dimensional continuous configuration spaces, multidi- 
mensional continuous action spaces (positions or torques), 
possibly continuous time, and deterministic dynamics. Our 
approach will be to construct discrete abstractions of the 
robot's state and action spaces, and to "make up" for the 
precision lost in so doing by modeling the effects of actions 
as stochastic. 

It is possible to use a grid discretization of the continuous 
belief space, but the high dimensionality of that space makes 
it infeasible for most problems of interest. Instead, we pursue 
a discretization strategy that is more directly motivated by 
the uncertainty in the problem. When there is uncertainty 



with respect to the configuration of the robot or obstacles, we 
will generally want to execute actions that reduce uncertainty, 
while making progress toward a goal. There are two ways 
to reduce uncertainty through action: one is to act to obtain 
observations that contain information about the underlying 
state; the other is to take actions that are "funnels," mapping 
large sets of possible initial states to a smaller set of resulting 
states. 

We start by considering the MDP, defined over complete 
configurations of the robot and object, that underlies our 
problem, and construct abstract state and action spaces and 
an abstract state transition model on those spaces. We will 
use the abstract MDP as the basis for an abstract POMDP. We 
construct the abstract space for the MDP by choosing a set of 
abstract actions [22] and using them to induce the state space. 
We will work with a set of "guarded" compliant motions as 
our action space. A guarded motion causes the robot to move 
along some vector until it makes or breaks a contact or reaches 
the limit of the workspace. Our action set includes guarded 
motions through free space, as well as compliant motions, in 
which the robot is constrained to maintain an existing contact 
while moving to acquire another one. Note that these actions 
serve as "funnels," producing configurations with multiple 
contacts between the robot and an object, and generating 
information about the underlying state. In the current work, 
we allow the robot to move only one degree of freedom at 
a time: there are motions in two directions for each DOF, 
which attempt to move in the commanded direction while 
maintaining the existing set of contacts, if possible. 

Abstraction methods for MDPs [5], derived from abstraction 
methods for finite- state automata, take an underlying set of 
states, a set of actions and a reward function, and try to 
construct the minimal abstract state space. This new state space 
is a partition of the original state space, the regions of which 
correspond to the abstract states. The abstract space must have 
the properties that, for any sequence of actions, the distribution 
of sequences of rewards in the abstract model is the same as it 
would have been in the original model, and, furthermore, that 
any two underlying states that are in the same abstract state 
have the same expected future value under the optimal policy. 

So, given a commitment to guarded motions as our action 
set, the known deterministic continuous dynamics of the 
robot, and a specification of a goal (or, more generally, a 
reward function), we can apply model-minimization methods 
to determine an abstract model that is effectively equivalent to 
the original. We obtain a large reduction through not having 
to represent the free space in detail, because after the first 
action, the robot will always be in contact, or in a very limited 
sub space of the whole free space. In addition, large regions 
of the state space will behave equivalently under funneling 
actions that move until contact. 

We begin by assuming an idealized deterministic dynamics, 
derived from the geometry, both in free space and in contact, 
and construct an abstract state space using those dynamics. 
Given that abstract state space, we will go back and compute 
more realistic transition probabilities among the abstract states. 



Finally, we will feed this resulting model into an approximate 
POMDP solver to derive a policy. Details of this process can 
be found in [7]. 

C. Example 

To give an intuition for the approach, we present a very 
simple example. Consider a two-dimensional Cartesian robot 
with a single rectangular finger, and a rectangular block on a 
table. The robot has contact sensors on the tip and each side 
of the finger, and so it can make eight possible observations 
(though combinations involving contact on both sides of the 
finger are impossible in this scenario). 

The action space is the set of compliant guarded moves up, 
down, left, and right. If the robot has a contact in one direction, 
say down, and tries to move to the left, it does so while 
attempting compliantly to maintain contact with the surface 
beneath. The robot moves until its observed contacts change. 
A motion can also be terminated when the robot reaches its 
limits (we'll assume a rectangular workspace). 

The robot's "goal" is to have the tip of its finger on top 
of the block. It has an additional action, called "lift" which 
is intended to mean that it could engage a vacuum actuator, 
or simply declare success. If the robot lifts when it has its 
finger on top of the block, then it gets a reward of +10 
and the trial is terminated. If it lifts when it is in any other 
configuration, then it gets a reward of -10, and the trial is 
terminated. Additionally, it gets a reward of -1 on each step 
to encourage shorter trajectories. 

The configuration space for this robot is two dimensional, 
with reachable configurations everywhere except for a box that 
is "grown" by the dimensions of the finger. Figure lA shows 
how the configuration space is initially segmented based on 
tactile observations (various one-dimensional loci of contact 
configurations are shown as thin rectangular boxes). The goal 
configurations are further differentiated into their own region 
in figure IB. Finally, we can propagate those regions so that 
we arrive at a segmentation that is closed, in the sense that 
all of the states in a single region, given a particular action, 
transition to a particular other region, as shown in figure IC. 

Even with completely deterministic transition and observa- 
tion models, the POMDP is a useful tool. If, for instance, 
we examine the optimal policy for the initial belief state in 
which the robot could be in state 2A, 2(7, or 2E, with equal 
probability, then we get the partial policy graph shown in 
figure l.D. In a policy graph, the nodes are labeled with actions 
and the arcs with observations. This policy has the robot begin 
by moving down until contact. At this point, it could be in 
regions IB, \A, or \C . The policy specifies that it move to 
the right, putting it in one of regions 5, 2B., or \C . Each of 
these regions has a different observation (tip-right, none, or 
tip), and so the rest of the strategy is determined in each case. 

If we add a significant amount of noise to the actions and 
observations, so that they have their "nominal" outcome about 
0.8 of the time, and generate erroneous readings or results with 
the remaining probability, this problem becomes much more 
complicated, and much too difficult to solve by hand. We can 
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Fig. 1. A. Observation partition; B. Observation and reward partition; C. 
Closed partition; D. Partial policy graph for robot starting in an unknown 
state above the table, with a deterministic transition and observation model. 



solve this model for the optimal policy, which embodies a 
fairly different strategy. Even the initial move is different: it 
asks the robot to move all the way over to the left at the very 
beginning, which moderately reliably funnels all the initial 
states into one, removing some of the initial uncertainty at the 
cost of performing an additional action. If the actions were 
even noisier, the policy might ask that the robot move to the 
left multiple times, to further reduce uncertainty. 

III. Solving the POMDP 

Even for the simplest problems, as soon as we add noise, it 
is infeasible to solve the resulting POMDPs exactly. We have 
used HSVI [20], a form of point-based value iteration, which 
samples belief states that have a relatively high probability of 
being encountered, and concentrates its representational and 
computational power in those parts of the belief space. 

HSVI returns policies in the form of a set of a vectors 
and associated actions. The expected discounted sum of values 
when executing this policy from some belief state h is 

V{h) = max 6 • ai 

and the best action is the action associated with the maximiz- 
ing alpha vector. The a vectors define hyperplanes in the belief 
space, and the maximization over them yields a value function 
that is piecewise-linear and convex. By construction, each of 
the Q^- vectors is maximal over some part of the belief space; 
and the space is partitioned according to which o^-vector is 
maximizing over that region. 



So, to execute a policy, we apply a state estimator as de- 
scribed earlier. The state estimator starts in some initial belief 
state, and then consumes successive actions and observations, 
maintaining the Bayes optimal belief state. To generate an 
action, the current belief state is dotted with each of the a- 
vectors, and the action associated with the winning o^-vector 
is executed. This process is quite efficient, even if the policies 
were slow^ to derive off-line. 

IV. Results 

As a proof of concept, we have tested the approach de- 
scribed above in two planar problems: one involving placing 
one finger on a stepped block and one for two-finger grasping 
of a block. We derive stochastic policies from simulations on a 
simple planar model. We then run the policy for the stochastic 
model in a high-fidelity dynamics simulation and measure av- 
erage total reward per episode. Note that the stochastic model 
and the high-fidelity model differ in some substantial details: 
the dimensions of the block and the geometry of the fingers are 
different and the actual sensor and detailed control behavior 
are different. Therefore, some of the trajectories that are 
most common in the high-fidelity simulation have relatively 
low probability in the stochastic model. These simulations 
gives us a measure of how much the mis-estimation of the 
probabilities in the stochastic model decreases performance. 
As a comparison, we report the results for a simple but 
reasonable fixed strategy, as well. 

Single finger/Stepped block: This domain is similar to the 
one described in the example in section II-C except that the 
block is "stepped". The goal is to place the finger in the corner 
at the left step. Note that, since the robot is lacking position 
information, the goal is locally indistinguishable to the sensors 
from the corner where the block rests on the table. The rewards 
are similar to the earlier example (-^15 for reaching the goal, 
-50 for lifting in the wrong state, -1 for each motion) except 
that we also penalize (-5) being in the states at the limits of the 
designated problem workspace. This discourages long motions 
that leave the vicinity of the block. The abstract state space 
for this problem has 40 states. 

A trajectory derived by following the stochastic policy for 
this problem is shown in Figure 2. This policy, found by 
solving the POMDP formulated as described above, succeeded 
in 466 out 506 trials (92%) in the high-fidelity simulation. 
Its average reward is —1.59. We can compare this to a fixed 
policy that simply moves the hand in a fixed pattern of 
left, down, right, right, up, right, right, right (LDRRURRR). 
This fixed policy succeeded in 154 out of 190 trials (81%), 
with an average reward of —10.632. The POMDP policy is 
considerably more robust than the fixed strategy. 

Figure 3 shows the deterministic version of the POMDP 
policy, derived in the absence of noise. While one could 
imagine hand-coding such a policy, it would take some time, 
and yet still not be sufficiently robust. The stochastic policy, 

^For large problems, the POMDP approximation methods may become 
slow, but for all the results reported here, the POMDP solutions ran in under 
10 minutes. 




Fig. 3. One-finger policy for deterministic stepped block model. 




Fig. 4. One-finger policy for noisy stepped block model. 



shown in Figure 4, is qualitatively different. Because the 
actions and observations are much less reliable, it has to handle 
many more cases, including rational responses to observation 
sequences that would have been impossible in the deterministic 
model. 

Two-dimensional Grasping: A more interesting domain is 
two-fingered grasping in two dimensions. In this case, the 
robot has three degrees of freedom: motion in the x and z 
planes, as well as opening and closing the parallel fingers. The 
abstract state space for the problem of grasping a block has 
408 states. Figure 5 shows a policy we automatically derived 
for grasping a block with two fingers, in the absence of noise. 
In this figure, the contacts for the left finger and right finger 
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Fig. 2. Sample run of one-finger policy on stochastic stepped block model. 



are shown on the arcs connected by a hyphen, e.g. TIP-NO 
indicates tip contact detected on the left finger and no contact 
detected on the right finger. The -G notation indicates that the 
fingers are at their wide open "limit stop". Each finger has tip, 
inside and outside sensors. 

An example trajectory derived by following the stochastic 
policy for this problem is shown in Figure 6. This policy, 
found by solving the POMDP formulated as described above 
(and encoded in approximately 1000 ce- vectors), succeeded in 
115 out of 115 trials (100%) in the high-fidelity simulation. 
Its average reward is 4.0. We can compare this to a fixed 
policy that simply moves the hand in a fixed pattern of 
LDRRURRDDDG. The fixed policy succeeded in 86 out of 
113 trials (76%), with an average reward of —17.24, which is 
significantly worse than the behavior of the POMDP policy. 

V. Experiments With a Physical Robot 

We are currently experimenting with some of these policies 
on our robot platform, which is a 7-dof Barrett arm with a 
3 -fingered, 4-dof Barrett hand. On the Barrett hand, two of 
the three fingers can spread together to perform pinch, tripod, 
or even hook grasps. When the two outer fingers are in their 
maximally spread position, all three fingers are together, and 
thus the hand can be viewed as a one-fingered robot. This 
configuration is shown in Figure 7. If the three fingers together 
are placed at an appropriate location on either of the two 
objects discussed previously, and the depth and width of the 
object are not too large, the outer two fingers can be swung 
around to perform a pinch grasp. Thus, in most of our initial 
experiments, we use the one-finger policies to position the 
hand for grasping. 

A. Sensors 

Fingers outfitted with sensors have three pressure sensors, 
inside, outside, and tip, as shown in Figure 8. The finger is also 
covered with cellular urethane foam to add both higher friction 
and compliance. Figure 9 is a diagram of the sensors used 
for the outside and inside of a sensing finger. These sensors 
consist of two force-sensing resistor (FSR) sensors sandwiched 
between two thin aluminum plates, with two tiny squares of 
foam focusing the force onto the FSR sensor pads. This design 




Fig. 7. Barrett arm and hand in one-fingered configuration. 



ensures that a contact force anywhere on the surface of the 
aluminum plate is detected despite the tiny surface area of the 
FSR pads. In addition, since the contact force applied to the 
plate is detected at each sensor in proportion to the distance 
between the applied force and that sensor's pad, one can use 
the two sensors' force information to estimate the location of 
the applied force along the length of the sensor. Construction 
is simple and requires only chopping two small rectangles of 
aluminum, scotch-taping the FSR sensors to the bottom plate, 
soldering wires to the FSR leads, sticking small squares of 
sticky-backed cellular urethane foam onto the sensor pads, and 
finally loosely attaching the top plate with more tape. 

While our POMDP policies currently only make use of 
contact/no-contact information, we plan to eventually use the 
location information to estimate likely orientations of the hand 
with respect to an object with a known model. Unlike the 
inner and outer sensors, the tip sensor has a tiny surface area 
and thus only has space for one FSR, preventing any sort of 
contact location estimate. Figure 10 is a graph of the estimated 
vs. actual position detected when a pointlike contact is applied 
to the layer of foam covering the sensor. Figure 1 1 is a graph 
of the total force detected at both sensors for varying applied 
forces. The dotted blue line shows the sensor response when 
the force is applied at the center of the sensor, the dashed red 
and green lines show the response at 1 cm on either side of 
the center lengthwise, and the solid magenta line shows the 
response at .5 cm from the center widthwise. While there are 
many sensors with similar or better capabilities on the market, 
another advantage of this design is its cost: approximately $12 



Fig. 5. Two-finger grasping policy for deterministic model. 
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Go down, see Tip-Tip Go right (stuck), see Tip-Tip Go right, see In&Tip-Tip Go down, see In&Tip-Tip Close, see In&Tip-In&Tip Lift (success) 



Fig. 6. Sample run of two-finger grasp policy in high-fidelity simulation. 





Fig. 8. Finger with sensors. 



in parts and about 10 minutes of assembly time. 

B. Single finger results 

Stepped block results: In our first set of experiments, we 
ran a simple POMDP that would place the Barrett hand in 
its one-fingered configuration at the corner of the right step 
of a stepped block, a position from which it can grasp the 
top block. To compare with the POMDP results, we used a 
closed-loop fixed policy that goes right, down, left until In, 
up until In is gone, then alternates left and down until In- 
Tip. This policy succeeded 49 out of 50 times, for an average 
reward (-^15 for success, -50 for failure, -1 for each step, -5 
for hitting a boundary) of 2.1. However, the one failure was 
not the fault of the policy-a spurious In-Tip was seen on the 
way to the corner, making the run indistinguishable in terms 
of sensor observations from a successful one; had it been a 



Fig. 9. Diagram of an FSR sensor that detects position as well as force of 
contact. 
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Fig. 10. Estimated vs. actual position of contact on sensor. 
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Fig. 11. Sum of estimated forces for varying forces applied at center of 
sensor, left and right length offsets of .8 cm, and .5 cm width offset. 



success, the reward would have been 3.4. A POMDP made 
to ignore the boundary cost takes nearly the same trajectory, 
with an extra two steps to verify that it has gone over the 
corner step successfully. Although it succeeded in all 50 of 50 
runs, the extra steps lower the average reward to 1.6. In this 
particular case, the closed-loop policy nearly always succeeds 
because actions that move up the side of the block without 
backtracking have a very low rate of failure; if that were the 
best trajectory to take in all cases, a POMDP policy would be 
overkill. A POMDP that takes into account the boundary cost 
and thus goes down first to avoid hitting the right boundary, 
on the other hand, has to deal with an extremely high rate 
of failure. It is a good indicator of the robustness of POMDP 
policies that running such a policy succeeded in 10 of 10 runs 
with an average reward of 4.0, even in the face of a number 
of spurious contacts and early-aborted actions. 

Since the actions had been tuned to that particular stepped 
block and thus had very few failures on the closed-loop fixed 
policy, we tried a smaller stepped block with the same action 
parameters to see how the policies would compare. For the 
smaller stepped block, moving up the side of the bottom block 
sometimes fails by missing the middle step. This is because 
while following the side, the controller tries to keep a constant 
pressure on the object while sliding, and while adjusting the 
depth, it is allowed to miss contacts for a tunable number 
of steps before regaining contact. This results in curving 
around corners before the controller decides that the contact is 
definitely lost and backtracks to regain contact before ending 
the action. On the larger block, there is room to spare to figure 
out that the bottom side is lost, but on the smaller stepped 
block, the finger can encounter the side of the top block while 
trying to regain contact and thus miss the middle ledge entirely. 
Because of this, the closed-loop fixed policy failed in 2 of 5 
runs. The same POMDP as before (that takes into account 
the boundary cost), on the other hand, still succeeded in 5 
of 5 runs. This includes one run that goes all the way to the 
right boundary before returning, missing the middle step on 
the way up the side of the stepped block, checking to make 
sure it has gone over the corner, then moving to what it 
expects will be the goal. At that point it unexpectedly sees 
an outside contact, figures out that it has missed the middle 



step, and recovers successfully. Snapshots from this run are 
shown in Figure 12. Because the POMDP is built to deal with 
unexpected transitions up to two states away in each move, it 
can recover from such situations with a robustness that would 
be difficult to achieve with hand- written policies. 

C. Two finger results 

We have just begun to run experiments on a two-fingered 
hand; as in simulation, our initial experiments involve trying 
to grasp a simple block. A fixed closed-loop policy (right, 
down, left until Out is seen on the left finger, up until Out is 
gone, left until In is seen on the left finger, down until Tip is 
seen on either finger, then grasp and lift) for this case has to 
do many more steps than a POMDP (or even a hand-written 
FSA) can get away with, but even so, it succeeded in 5 of 5 
runs. The POMDP policy (which starts by going down instead) 
for this situation also succeeded in 5 of 5 runs, although few 
enough errors in the actions were seen that a POMDP policy 
is probably overkill for this situation; again, the block size is 
ideal for the current actions, and more errors might be seen 
on differently- sized blocks. 

This work is an initial attempt to apply POMDPs to the 
problem of robot manipulation. It demonstrates that consid- 
erable advantages in robustness can be gained through the 
POMDP formulation. Important next steps will be to work 
with more general object shapes, to address uncertainty in 
rotation and shape, to allow objects to slide or tip, and to 
handle interactions with other objects. 
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Fig. 12. Sample run of one finger on a small stepped block dealing successfully with accidentally skipping the middle step. 
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